Andrew Zammit Mangion
09 October 2014
Environmental systems are characterised by phenomena which evolve both in space and in time (weather and climate, glacial flow, spread of invasive species etc.).
The aim of spatio-temporal modelling is to
\( \def\bold#1{\mathbf #1} \) \( \def\yvec{\bold{y}} \) \( \def\svec{\bold{s}} \) \( \def\vvec{\bold{v}} \) \( \def\wvec{\bold{w}} \) \( \def\gammab{\bold{\gamma}} \) \( \def\thetab{\bold{\theta}} \) \( \def\kvec{\bold{k}} \) \( \def\phib{\bold{\phi}} \) \( \def\psib{\bold{\psi}} \) \( \def\evec{\bold{e}} \) \( \def\xvec{\bold{x}} \) \( \def\Qmat{\bold{Q}} \) \( \def\Amat{\bold{A}} \) \( \def\betab{\bold{\beta}} \) \( \def\Lmat{\bold{L}} \) \( \def\Thetab{\bold{\Theta}} \) \( \def\Jmat{\bold{J}} \) \( \def\Cmat{\bold{C}} \) \( \def\Tmat{\bold{T}} \)
Divide and conquer approach to a complex problem.
\[ \yvec_t = g(X_t(\svec), \vvec_t; \thetab_h) \]
\[ X_t(\svec) = f(\svec,t,\wvec_t(\svec); \thetab_m) \]
\[ p(\thetab_m ; \gammab_m), p(\thetab_h; \gammab_h), p(\thetab_v ; \gammab_v), p(\thetab_w; \gammab_w) \]
Two approaches:
\[ X_t(\svec) \sim GP(\mu_t(\svec),K(t,t',\svec,\svec')) \]
General-purpose model, used when the dynamic structure is largely unknown.
\[ X_t(\svec) = f(X_{t-1}(\svec), X_{t-2}(\svec), \dots, \wvec_t(\svec); \thetab_m) \]
\[ \frac{\partial X(s;t)}{\partial t} - \beta \frac{\partial^2X(s;t)}{\partial s^2} + a X(s;t) = e(s;t) \]
where \( \alpha >0, \beta > 0 \) and \( e(s;t) \) is a random, zero mean error process. Then the geo-statistical approach requires use of covariance function:
\[ K(u;v) = \frac{K(0;0)}{2}\left(e^{-u^{\left( \frac{\alpha}{\beta}\right)^\frac{1}{2}}} Erfc\left(\frac{2v\left( \frac{\alpha}{\beta}\right)^\frac{1}{2} - \frac{u}{\beta}}{2\left( \frac{v}{\beta}\right)^\frac{1}{2}} \right) + \\ e^{u^{\left( \frac{\alpha}{\beta}\right)^\frac{1}{2}}} Erfc\left(\frac{2v\left( \frac{\alpha}{\beta}\right)^\frac{1}{2} + \frac{u}{\beta}}{2\left( \frac{v}{\beta}\right)^\frac{1}{2}}\right) \right) \]
Continuous-time ST processes are computationally tedious to deal with.
Focus on models of the form
\[ X_t(\svec) = f(X_{t-1}(\svec)) + e_t(\svec) \]
where \( ~e_t(\svec)~ \) is the solution to
\[ (\kappa^2 - \Delta)^{\alpha/2}(\tau e(\svec)) = \mathcal{W(\svec)} \]
where \( \mathcal{W(\svec)} \) is a spatial (white) Wiener process.
The SPDE
\[ (\kappa^2 - \Delta)^{\alpha/2}(\tau e(\svec)) = \mathcal{W(\svec)} \]
has as solution (Whittle, 1954)
\[ e(\svec) \sim \mathcal{GP}(0,K(\svec,\svec'; \alpha,\tau,\kappa)) \]
where \( k(\cdot,\cdot) \) is a Matérn covariance function. The wave spectrum of \( e(\svec) \) is given by
\[ R(\kvec) = (2\pi)^{-\alpha}(\kappa^2 + \| \kvec \|^2)^{-\alpha} \]
Non-stationarity can be handled by making the SPDE spatially varying. This results in a covariance function which is not a Matérn field, but do we care?
We can model SPDEs on manifolds, not restricted to \( \mathbb{R}^2 \).
SPDEs can be approximated to arbitrary precision using an appropriate basis (Lindgren, 2011).
\[ X_{t+1}(\svec) = \mathcal{A}X_t(\svec) + e_t(\svec) \]
\[ \langle \phib,X_{t+1}(\svec) \rangle = \langle \phib,\mathcal{A}X_{t+1}(\svec) \rangle + \langle \phib,e_t(\svec)\rangle \]
\[ \langle \phib,\psib^T \rangle\xvec_{t+1} = \langle \phib,\mathcal{A}\psib^T \rangle\xvec_t + \langle \phib,\psib^T\rangle\evec_t \]
For \( \alpha = 2 \) and \( \psib = \phib \) (Galerkin method):
\[ [ \langle\phib, \kappa^2\phib^T\rangle - \langle\phib, \Delta\phib^T\rangle)]\tau \evec = \langle \phib,\mathcal{W(\svec)} \rangle \]
\[ \Large \evec \sim \mathcal{N}(\bold{0},\Qmat^{-1}) \] where \( \Qmat \) is sparse.
(Lindgren, 2014)
Useful when we have multiple interacting spatio-temporal processes.
Consider the bi-variate sub-system of random forcings: \( e_{1,t}(\svec), e_{2,t}(\svec) \). Define the multi-variate Gaussian process (MVGP) as
\[ (e_{1,t}(\svec), e_{2,t}(\svec)) = \bold{e}(\svec) \sim \mathcal{MVGP}(\bold{0},\bold{K}(\svec,\svec',t,t';\thetab_K)) \]
where \( \bold{K}(\svec,\svec',t,t';\thetab_K) \) is a \( 2 \times 2 \) cross-covariance matrix function (i.e.~ each element of \( \bold{K} \) is a covariance function (Finley et al., 2007):
\[ \bold{K}(\svec,\svec',t,t') = [Cov(e_{i,t}(\svec), e_{i,t'}(\svec'))]_{i,j \in \{1,2\}} \]
\[ \begin{bmatrix} \mathcal{A}_1 & & \\ & & \mathcal{A}_2 & \mathcal{A}_{2,3} \\ & & \mathcal{A}_{3,2} & \mathcal{A}_3 \end{bmatrix} \begin{bmatrix} e_1(\svec) \\ e_2(\svec) \\ e_3(\svec) \end{bmatrix} = \begin{bmatrix} \mathcal{W}_1(\svec) \\ \mathcal{W}_2(\svec) \\ \mathcal{W}_3(\svec) \end{bmatrix} \]
SPDEs are a useful tool in modelling complex phenomena.
Real power comes in using finite elements to decompose and represent them as GMRFs.
From now on we restrict ourselves to the linear, Gaussian case
\[ \begin{bmatrix} \xvec_{1,t} \\ \xvec_{2,t} \\ \vdots \end{bmatrix} = \begin{bmatrix} \Amat_{1} & \Amat_{1,2} & \dots \\ \Amat_{2,1}& \Amat_{2,2} & \dots \\ \vdots& \vdots & \ddots \\ \end{bmatrix} \begin{bmatrix} \xvec_{1,t-1} \\ \xvec_{2,t-1} \\ \vdots \end{bmatrix} + \bold{J}_t\bold{\beta} + \begin{bmatrix} \evec_{1,t} \\ \evec_{2,t} \\ \vdots \end{bmatrix} \]
Collapse all variates, and their spatial and temporal components into one big GMRF.
Sparsity is retained and one can make use of fill-in reduction permutations when computing the Cholesky factor (Knorr-Rue and Held, 2002)
\[ p(\xvec,\betab_I | \Thetab) = \left(\prod_{i=1}^T{p(\xvec_{t} | \xvec_{t-1},\betab,\Thetab)}\right)p(\xvec_0 | \betab, \Thetab)p(\betab_I) \]
\[ \xvec_0|\betab_I,\Thetab \sim \mathcal{N}(\bold{J}_0\betab,[\Qmat_w - \Amat^T\Qmat_w\Amat]^{-1}) \]
\( (\xvec,\betab) \sim \mathcal{N}(\bold{0},\Qmat^{-1}) \) where
\[ \small \begin{array}{cc} \Qmat = \begin{bmatrix} \Qmat_1 & \Qmat_2 \\ \Qmat_2^T & \Qmat_3 \end{bmatrix} & \Qmat_2 = \begin{bmatrix} \Amat^T\Qmat_w\Jmat_{1} - \Qmat_0\Jmat_{0} \\ \Amat^T\Qmat_w\Jmat_{2} - \Qmat_w\Jmat_1 \\ \vdots\end{bmatrix}, \end{array} \]
\[ \small \Qmat_1 = \begin{bmatrix} \Amat^T\Qmat_w\Amat + \Qmat_0 & -\Amat^T\Qmat_w & ~~~ \\ -\Qmat_w\Amat & \Amat^T\Qmat_w\Amat + \Qmat_w& -\Amat^T\Qmat_w & ~~~ \\ & -\Qmat_w\Amat & \ddots & \ddots \\ & \ddots & \ddots & \ddots & \end{bmatrix} \]
\[ \small \Qmat_3 = \Jmat_0^T\Qmat_0\Jmat_0 + \sum_{i=1}^{T} (\Jmat_i^T\Qmat_w\Jmat_i) + \Qmat_\beta \]
\[ \begin{align} \yvec &= \Cmat \xvec + \vvec \\ \vvec &\sim \mathcal{N}(0,\Tmat^{-1}) \\ \xvec &\sim \mathcal{N}(0,\Qmat^{-1}) \end{align} \]
Lemma 1: (Erisman and Tinney, 1975) Let \( \Lmat = chol(\Qmat) \). If \( L_{ij} \ne 0 \), then \( \Sigma_{ij} \) can be computed using elements in
- \( \Lmat \) and
- \( \Sigma^*_{IJ}, (I,J) = \{ i', j'; i' > i, j' > j, \Lmat_{i'j'}, \ne 0 \} \)
Theorem 1: Let \( \yvec = \Cmat\xvec + \vvec \), \( \vvec \sim \mathcal{N}(0,\Tmat^{-1}) \) and \( \xvec \sim \mathcal{N}(0,\Qmat^{-1}) \). Then if
- \( \textrm{zeros}(\Amat) \subseteq \textrm{zeros}(\Cmat) \)
- \( \Amat \) is non-negative
- \( \Tmat \) is diagonal, then \( diag(\Amat \Sigma^* \Amat^T) \equiv diag(\Amat \Sigma^*_{00} \Amat^T) \)
In practice the Takahashi equations are always used with GMRFs in order to obtain the marginal variances. Theorem 2 provides a means to obtain uncertainty on linear combinations essentially for free.
I can provide predictive uncertainties at any point provided that each row of \( zeros(\Amat) \subseteq zeros(\Cmat) \).
Asub <- getC(e[[1]])[1:5000,]
system.time({ est <-diag( Asub %*% Results$Partial_Cov %*% t(Asub)) })
> user system elapsed
> 5.232 0.465 5.698
Lp <- Results$Qpermchol
P <- Results$P
system.time({X <- solve(Lp, t(P)%*%t(Asub))
est <- crossprod(X,X)} )
> user system elapsed
> 221.250 0.382 221.688
Lemma 2: If \( \Amat \) is non-negative then \( [\Amat^T\Amat]_{jk} = 0 \Leftrightarrow diag(\Amat\Sigma^*\Amat^T) \) does not depend on \( \Sigma^*_{jk} \)
\[ \begin{align} zeros(\Qmat^*) &= zeros(\Qmat + \Amat^T\Tmat^{-1}\Amat) \\ &\ge zeros(\Amat^T\Tmat^{-1}\Amat) = zeros(\Amat^T\Amat) \end{align} \]
\[ zeros(\Lmat^*)_{jk} \ge zeros(\Qmat^*)_{jk} \\ \ge zeros(\Amat^T\Amat) \]
\[ zeros(\Sigma_{00}) \ge zeros(\Lmat^*)_{jk} \]
\[ x_i \perp \xvec_{-\{i,ne(i)\}} | \xvec_{ne(i)} ~~~ \forall i \in \mathcal{V} \]
\[ E(x_i | \xvec_{-i}) = -\frac{1}{Q_{ii}}\sum_{j:j\sim i}Q_{ij}x_j; ~~~ prec(x_i | \xvec_{-i}) = Q_{ii} \]
Theorem 3: Any planar map divided into contigious regions can be coloured using at most FOUR colours (Appel and Haken, 1976)
Theorem 3: Any planar map divided into contigious regions can be coloured using at most FOUR colours (Appel and Haken, 1976)
Theorem 3: Any planar map divided into contigious regions can be coloured using at most FOUR colours (Appel and Haken, 1976)
\[ \xvec_A \perp \xvec_B | \xvec_{C} \]
\( \forall A,B,C \notin \varnothing \) and \( C \) separating \( A,B \).
In typical Bayesian networks, finding \( A, B \) and \( C \) can be difficult (seed and spawn idea, Gonzales, 2011).
With a spatial network, this is remarkably easy using the Kernighan-Lin algorithm for graph partitioning (Kernighan and Lin, 1970).
Let \( G(V,E) \) be a graph, and let \( V \) be the set of nodes and \( E \) the set of edges.
Find a partition of two disjoint, equal-size subsets \( A,B \), such that the sum of the weights of the edges between nodes in A and B is minimized.
Setting edge weights = 1, gives a bisection algorithm!
Sampling from the prior guarantees enormous speed-ups using MPI.
Observations with large spatial footprints cause the four-colour theorem to fail.
\[ \begin{align} \widehat\Qmat^{ii} \hat\xvec^i &= \Cmat^{i^T}\Tmat^{-1}\left(\yvec - \sum_{j \in [1\dots d]/i}\Cmat^j\hat\xvec^j\right) - \\ &~~~~ \sum_{j \in [1\dots d]/i} \left(\Qmat^{ij}\hat\xvec^j \right) \end{align} \]
No two adjacent nodes in a super-graph can have the same colour and no two nodes of the same colour can be spanned by the same observation.
Multi-variate spatio-temporal statistics has a lot to offer in a large range of environmental problems which remain untapped.
Inference with large-scale spatio-temporal problems is hard but usually possible (INLA, VB, MCMC). However we need more with multi-variate systems.
Many computational benefits to GMRFs and parallelisation which remain unexploited.
Parallel sampling can also become intractable, what about parallel approximate Bayesian sparse methods? http://arxiv.org/abs/1305.4152
Jonathan Rougier (University of Bristol)
Botond Cseke (University of Edinburgh)
Finn Lindgren (University of Bath)